Picture for Haitao Mi

Haitao Mi

Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation

Add code
May 06, 2025
Viaarxiv icon

WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model

Add code
Apr 23, 2025
Viaarxiv icon

Enhancing Web Agents with Explicit Rollback Mechanisms

Add code
Apr 16, 2025
Viaarxiv icon

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Add code
Apr 15, 2025
Viaarxiv icon

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

Add code
Apr 01, 2025
Viaarxiv icon

Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique

Add code
Mar 21, 2025
Viaarxiv icon

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models

Add code
Mar 04, 2025
Viaarxiv icon

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

Add code
Feb 24, 2025
Viaarxiv icon

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Add code
Jan 30, 2025
Figure 1 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 2 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 3 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 4 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Viaarxiv icon

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Add code
Dec 30, 2024
Viaarxiv icon